Implementing the CDISC Library RESTful API in R: Automated Access to Metadata Repositories and Controlled Terminologies


PharmaSUG SDE

Jagadish Katam
12-Apr-2025

CDISC Library


  • The CDISC Library is a single, trusted, authoritative source of CDISC Data Standards metadata and Controlled Terminology.

  • It is a cloud-based metadata repository (MDR) on the Microsoft Azure platform.

  • CDISC Library is composed of

    1. Data Standards Browser (DSB) and
    2. API

  • Users can browse and retrieve metadata such as CDASH, SDTM, ADaM, QRS, Controlled Terminology, etc.

  • CDISC makes our data standards available in PDF format. The Implementation Guides (IG) and all applicable contents are now also available in machine-readable format.













Data Standards Browser (DSB)














API


  • API stands for Application Programming Interface.
  • APIs allow two applications to communicate with each other.
  • It is a contract between the client and server.
  • The client sends a request through the API, and after performing the action, the API sends back a response to the client.

  • Types of Application Programming Interface

    • Private API
    • Public API
    • Partner API













Application Programming Interface (API)


Source: https://www.postman.com/what-is-an-api/













What is REST?


  • REST stands for REpresentational State Transfer.
  • It is a set of functions to which the developers perform requests and receive responses. In REST API, interaction is made via the HTTP protocol.
  • REST also allows computers to talk to each other over a network.
  • It returns the object in the form of XML or JavaScript Object Notation (JSON).

Source: https://www.slideshare.net/slideshow/what-is-rest-api-rest-api-concepts-and-examples-edureka/174179563













HTTP CODES


Some of the common HTTP result codes that are often used inside REST APIs are as follows:

200 - “OK”
201 - “Created” (Used with POST).
400 - “Bad Request” (Perhaps missing required parameters).
401 - “Unauthorized” (Missing authentication parameters).
403 - “Forbidden” (You were authenticated but lacking required privileges).
404 - “Not Found”.













HTTP Method - GET

























Endpoint and Parameters


























CDISC Library API














CDISC Library Access


# API URL
url <- "https://api.library.cdisc.org/api/mdr/sdtmig/3-4"

# Construct the request
req <- request(url) %>%
  req_headers(
    'Cache-Control' = 'no-cache',
    'api-key' = 'ba3d68879a224d8090406948f8155bae',
    'content-type' = 'application/json'
  )

# Send the request and fetch response
resp <- req %>% req_perform()

# Parse JSON response
json_list <- resp %>% resp_body_json()













Structure of the JSON file














Extract data from the json file


The JSON data is loaded into R as a list, it means it contains key-value pairs (named lists) or an array of objects (list of lists). We can convert, manipulate, or flatten it for further use.


# Convert list of lists into a data frame
df <- map_dfr(1:length(json_list$classes), function(i) {  # Loop through classes
  class_item <- json_list$classes[[i]]  # Extract class
  
  map_dfr(1:length(class_item$datasets), function(j) {  # Loop through datasets in each class
    dataset_item <- class_item$datasets[[j]]  # Extract dataset

    # Create a data frame with required fields
    data.frame(
      class = class_item$label,
      datastructure = dataset_item$datasetStructure %||% NA,
      description = dataset_item$description %||% NA,
      label = dataset_item$label %||% NA,
      name = dataset_item$name %||% NA,
      ordinal = dataset_item$ordinal %||% NA,
      stringsAsFactors = FALSE
    )
  })
})













SDTMIG Classes and Datasets














Extract SDTMIG Datasets and Variables from json


# Convert list of lists into a data frame
dataset_df <- map_dfr(1:length(json_list$classes), function(i) {
  class_data <- json_list$classes[[i]]
  
  map_dfr(1:length(class_data$datasets), function(j) {
    dataset <- class_data$datasets[[j]]
    
    if (is.null(dataset$datasetVariables)) {
      return(NULL)  # Skip datasets with no variables
    }
    
    # Extract dataset variables
    variable_df <- map_dfr(1:length(dataset$datasetVariables), function(x) {
      var <- dataset$datasetVariables[[x]]
      
    # Extract codelist href if available, otherwise NA
      href_value <- if (!is.null(var$`_links`$codelist) && length(var$`_links`$codelist) > 0) {
        var$`_links`$codelist[[1]]$href %||% NA
      } else {
        NA
      }
      
      data.frame(
        dataset = dataset$name,
        Ordinal = as.numeric(var$ordinal) %||% NA,
        Name = var$name %||% NA,
        Label = var$label %||% NA,
        Description = var$description %||% NA,
        Datatype = var$simpleDatatype %||% NA,
        Role = var$role %||% NA,
        core = var$core %||% NA,
        Codelist = stringr::str_extract(href_value,'C\\d+$'),
        stringsAsFactors = FALSE
      )
    })
    
  })
}) |> arrange(dataset,Ordinal)













SDTMIG Datasets & Variables














Controlled Terminology API Request


# API URL
url <- "https://api.library.cdisc.org/api/mdr/ct/packages/sdtmct-2024-09-27"

# Construct the request
req <- request(url) %>%
  req_headers(
    'Cache-Control' = 'no-cache',
    'api-key' = 'ba3d68879a224d8090406948f8155bae',
    'content-type' = 'application/json'
  )

# Send the request and fetch response
resp <- req %>% req_perform()

# Parse JSON response
ct_list <- resp %>% resp_body_json()













Controlled Terminology Response returns JSON


# Get length of codelists
codelist_count <- length(ct_list$codelists)

# Convert all nested lists into a data frame
ct_codelist_df <- map_dfr(1:codelist_count, function(i) {
  
  # Extract current codelist
  ctcodelist <- ct_list$codelists[[i]]
  
  # Extract codelist-level details
  ct_codelist_info <- data.frame(
    codelist = ctcodelist$conceptId %||% NA,
    definition = ctcodelist$definition %||% NA,
    extensible = ctcodelist$extensible %||% NA,
    name = ctcodelist$name %||% NA,
    nci_preferred_Term = ctcodelist$preferredTerm %||% NA,
    submission_Value = ctcodelist$submissionValue %||% NA,
    stringsAsFactors = FALSE
  )
  
  # Get length of terms
  terms_count <- length(ctcodelist$terms)
  
  # Extract term-level details (if available)
  terms_df <- map_dfr(1:terms_count, function(j) {
    term <- ctcodelist$terms[[j]]
    data.frame(
      term = term$conceptId %||% NA,
      term_definition = term$definition %||% NA,
      term_nci_preferred_Term = term$preferredTerm %||% NA,
      term_submission_Value = term$submissionValue %||% NA,
      stringsAsFactors = FALSE
    )
  })
  
  # Merge codelist details with terms (repeat codelist info for each term)
  bind_cols(ct_codelist_info, terms_df)
})













Controlled Terminology Response returns JSON













References


  • SlideShare User. Introduction to APIs (Application Programming Interface). SlideShare. Accessed [Date]. https://www.slideshare.net/slideshow/introduction-to-apis-application-programming-interface/229843498.











































Conclusion


  • Direct access to CDISC standards (e.g., SDTM, ADaM, Define-XML) and retrieve the latest official CDISC standards.

  • Helps enforce standardized data structures in clinical trials..

  • Reduces manual errors in implementing CDISC models..

  • Provides structured JSON responses instead of PDFs or static documents.

  • Enables programmatic integration with clinical trial software and tools.

  • Access controlled terminology, dataset structures, and variable definitions.